Synthetic Biology
◐ Oxford University Press (OUP)
All preprints, ranked by how well they match Synthetic Biology's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Cai, Y.-M.; Witham, S.; Patron, N.
Show abstract
Sequence features, including the binding affinity of binding motifs for their cognate transcription factors, are important contributors to promoter behavior. The ability to predictably recode affinity enables the development of synthetic promoters with varying levels of response to known cellular signals. Here we describe a luminescence-based microplate assay for comparing the interactions of transcription factors with short DNA probes. We then demonstrate how this data can be used to design synthetic plant promoters of varying strengths that respond to the same transcription.
Myers, Z. A.; Swain, S.; Bialek, S.; Keltner, S.; Holt, B. F.
Show abstract
Transcription factors (TFs) are fundamental components of biological regulation, facilitating the basal and differential gene expression necessary for life. TFs exert transcriptional regulation through interactions with both DNA and other TFs, ultimately influencing the action of RNA polymerase at a genomic locus. Current approaches are proficient at identification of binding site requirements for individual TFs, but few methods have been adapted to study oligomeric TF complexes. Further, many approaches that have been turned toward understanding DNA binding of TF complexes, such as electrophoretic mobility shift assays, require protein purification steps that can be burdensome or scope-limiting when considering more exhaustive experimental design. In order to address these shortfalls and to facilitate a more streamlined approach to understanding DNA binding by TF complexes, we developed the DIMR (Dynamic, Interdependent TF binding Molecular Reporter) system, a modular, yeast-based synthetic transcriptional activity reporter. As a proof of concept, we focused on the NUCLEAR FACTOR-Y (NF-Y) family of obligate heterotrimeric TFs in Arabidopsis thaliana. The DIMR system was able to reproduce the strict DNA-binding requirements of an experimentally validated NF-YA2/B2/C3 complex with high fidelity, including recapitulation of previously characterized mutations in subunits that either break NF-Y complex interactions or are directly involved in DNA binding. The DIMR system is a novel, powerful, and easy-to-use approach to address questions regarding the binding of oligomeric TFs to DNA.\n\nOne sentence summaryThe DIMR system provides an accessible and easy-to-use platform to elucidate DNA binding and transcriptional regulatory capacity of oligomeric transcription factor complexes
Yanez Feliu, G.; Earle Gomez, B.; Berrocal, V. C.; Munoz Silva, M.; Nunez, I. N.; Matute, T. F.; Arce Medina, A.; Vidal, G.; Vidal Cespedes, C.; Dahlin, J.; Federici, F.; Rudge, T. J.
Show abstract
Characterization is fundamental to the design, build, test, learn (DBTL) cycle for engineering synthetic genetic circuits. Components must be described in such a way as to account for their behavior in a range of contexts. Measurements and associated metadata, including part composition, constitute the test phase of the DBTL cycle. These data may consist of measurements of thousands of circuits, measured in hundreds of conditions, in multiple assays potentially performed in different labs and using different techniques. In order to inform the learn phase this large volume of data must be filtered, collated, and analyzed. Characterization consists of using this data to parameterize models of component function in different contexts, and combining them to predict behaviors of novel circuits. Tools to store, organize, share, and analyze large volumes of measurement and metadata are therefore essential to linking the test phase to the build and learn phases, closing the loop of the DBTL cycle. Here we present such a system, implemented as a web app with a backend data registry and analysis engine. An interactive frontend provides powerful querying, plotting and analysis tools, and we provide a REST API and Python package for full integration with external build and learn software. All measurements are associated to circuit part composition via SBOL. We demonstrate our tool by characterizing a range of genetic components and circuits according to composition and context.
Liya, D. H.; Elanchezhian, M.; Pahari, M.; Anand, N. M.; Suresh, S.; Balaji, N.; Jainarayanan, A. K.
Show abstract
Promoters play a key role in influencing transcriptional regulation for fine-tuning expression of genes. Heterologous promoter engineering has been a widely used concept to control the level of transcription in all model organisms. The strength of a promoter is mainly determined by its nucleotide composition. Many promoter libraries have been curated but few have attempted to develop theoretical methods to predict the strength of promoters from its nucleotide sequence. Such theoretical methods are not only valuable in the design of promoters with specified strength, but are also meaningful to understand the mechanism of promoters in gene transcription. In this study, we present a theoretical model to describe the relationship between promoter strength and nucleotide sequence in Saccharomyces cerevisiae. We infer from our analysis that the -49 to 10 sequence with respect to the Transcription Start Site represents the minimal region that can be used to predict the promoter strength. We present an online tool https://qpromoters.com/ that takes advantage of this fact to quickly quantify the strength of the promoters.
Schuster, L.; Mejia, C.; Trujillo Rodriguez, L.; Kairalla, E.; Reisch, C. R.; Chevrette, M. G.; Dias, R.
Show abstract
Although research on promoters has spanned decades, the precise prediction of promoter activity from DNA sequence remains a challenge even in model organisms. Recent literature has identified important differences in the core sequence of {sigma}70 promoters across classes of Proteobacteria as well as a lack of transferability when promoters are moved from host to host. Currently, there is a need for synthetic constitutive promoters spanning a range of expression levels in species outside of Escherichia coli. Additionally, characterization data defining behavior of the same promoter across multiple species would be extremely valuable to the field. Here, we analyzed promoter activity in three classes of Proteobacteria, which enabled us to better understand the sequence elements correlated with a strong promoter in different hosts. In doing so, we identified and characterized constitutive promoters spanning a range of expression in these species for community use and described the portability of a subset of these promoters as they were moved between hosts. These promoter libraries have broad applications as predictable genetic tools to control gene expression in diverse species (1-3). This work adds to the toolkit for gene expression in non-model bacteria and is a step towards the larger goal of accurate promoter prediction in a given host from a de novo sequence.
Dash, S.; Jagadeesan, R.; Baptista, I. S. C.; Chauhan, V.; Kandavalli, V.; Oliveira, S. M. D.; Ribeiro, A. S.
Show abstract
The topology of the transcription factor network (TFN) of E. coli is far from uniform, with 22 global regulator (GR) proteins controlling one-third of all genes. So far, their production rates cannot be tracked by comparable fluorescent proteins. We developed a library of fluorescent reporters for 16 GRs for this purpose. Each consists of a single-copy plasmid coding for GFP fused to the full-length copy of the native promoter. We tracked their activity in exponential and stationary growth, as well as under weak and strong stresses. We show that the reporters have high sensitivity and specificity to all stresses tested and detect single-cell variability in transcription rates. Given the influence of GRs on the TFN, we expect that the new library will contribute to dissecting global transcriptional stress-response programs of E. coli. Moreover, the library can be invaluable in bioindustrial applications that tune those programs to, instead of cell growth, favor productivity while reducing energy consumption. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/568972v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@15d7ef4org.highwire.dtl.DTLVardef@139d856org.highwire.dtl.DTLVardef@aa0517org.highwire.dtl.DTLVardef@344a2b_HPS_FORMAT_FIGEXP M_FIG C_FIG
Greenwood, M.; Reardon, K. F.; Prasad, A.
Show abstract
Reporter cell assays, such as those used to detect estrogenic chemicals, can detect target chemicals at low concentrations and can be used to analyze chemical mixtures without a priori knowledge of the mixture components. However, the outputs of these assays are affected by biological variability, which complicates their interpretation. Here, we describe and demonstrate a workflow that is useful for determining potential sources of biological variability and optimizing the performance of cell-based assays. The workflow involves developing an appropriate mathematical model for a transcriptional activation assay, calibrating it with experimental data, and conducting sensitivity analysis to characterize individual components of the genetic circuit based on their effect on the reporter signal output. This workflow was tested using an estrogen receptor transcriptional activation assay. For this circuit, our analysis predicts that controlling estrogen response element number, promoter strength, and reporter signal degradation rates minimizes reporter output variability. We show that careful model development, calibration, and analysis can offer biologically relevant insights to minimize the variability of cell-based assays and improve genetic circuits for increased sensitivity and dynamic range.
Clark, L.; Voigt, C.; Jewett, M. C.
Show abstract
Plastid engineering offers the potential to carry multi-gene traits in plants, however, it requires reliable genetic parts to balance expression. The difficulty of chloroplast transformation and slow plant growth make it challenging to build plants just to characterize genetic parts. To address these limitations, we developed a cell-free system from Nicotiana tabacum chloroplast extracts for prototyping genetic parts. Our cell-free system uses combined transcription and translation driven by T7 RNA polymerase and works with plasmid or linear template DNA. To develop our system, we optimized lysis, extract preparation procedures (e.g., runoff reaction, centrifugation, and dialysis), and the physiochemical reaction conditions. Our cell-free system can synthesize 34 {+/-} 1 g/mL luciferase in batch reactions. We apply our system to test a library of 104 ribosome binding site (RBS) variants and rank them based on cell-free gene expression. We observe a 1300-fold range of luciferase expression normalized by mRNA expression, as assessed by the malachite green aptamer (relative luminescence units per relative fluorescence units). We also find a positive correlation between the observed expression in chloroplast extracts and the predictions made by the RBS calculator. We anticipate that chloroplast cell-free systems will increase the speed and reliability of building genetic programs in plant chloroplasts for diverse applications.
Gerngross, D.; Beerenwinkel, N.; Panke, S.
Show abstract
Controlling the expression levels of multiple recombinant proteins for optimal performance is crucial for synthetic biosystems but remains difficult given the large number of DNA-encoded factors that influence the process of gene expression from transcription to translation. In bacterial hosts, biosystems can be economically encoded as operons, but the sequence requirements for exact tuning of expression levels in an operon remain unclear. Here, we demonstrate the extent and predictability of protein-level variation using diverse arrangements of twelve genes to generate 88 synthetic operons with up to seven genes at varying inducer concentrations. The resulting 2772 protein expression measurements allowed the training of a sequence-based machine learning model that explains 83% of the variation in the data with a mean absolute error of 9% relative to reference constructs, making it a useful tool for protein expression prediction. Feature importance analysis indicates that operon length, gene position and gene junction structure are of major importance for protein expression.
Vaisbourd, E.; Bren, A.; Alon, U.; Glass, D. S.
Show abstract
Plasmids are an essential tool for basic research and biotechnology applications. To optimize plasmid-based circuits, it is crucial to control plasmid integrity, including the formation of plasmid multimers. Multimers are tandem repeats of entire plasmids formed during replication by failed dimer resolution. Multimers can affect the behavior of synthetic circuits, especially ones that include DNA-editing enzymes. However, occurrence of multimers is not commonly assayed. Here we survey four commonly used plasmid backbones for occurrence of multimers in cloning (JM109) and wild-type (MG1655) strains. We find that multimers occur appreciably only in MG1655, with the fraction of plasmids existing as multimers increasing with both plasmid copy number and culture passaging. In contrast, introduction of multimers into JM109 can produce strains containing only multimers. We present an MG1655{Delta} recA single-locus knockout that avoids multimer production. These results can aid synthetic biologists in improving design and reliability of plasmid-based circuits.
Dods, G.; Gomez-Schiavon, M.; El-Samad, H.; Ng, A. H.
Show abstract
Mathematical models can aid the design of genetic circuits, but may yield inaccurate results if individual parts are not modeled at the appropriate resolution. To illustrate the importance of this concept, we study transcriptional cascades consisting of two inducible synthetic transcription factors connected in series. Despite the simplicity of this design, we find that accurate prediction of circuit behavior requires mapping the dose responses of each circuit component along the dimensions of both its expression level and its inducer concentration. With such multidimensional characterizations, we were able to computationally explore the behavior of 16 different circuit designs. We experimentally verified a subset of these predictions and found substantial agreement. This method of biological part characterization enables the use of models to identify (un)desired circuit behaviors prior to experimental implementation, thus shortening the design-build-test cycle for more complex circuits.Competing Interest StatementThe authors have declared no competing interest.AbbreviationsiSynTFinducible synthetic transcription factorYFPyellow fluorescent proteinGEMGal4 DNA binding domain, estradiol ligand binding domain, Msn2 activating domainZ3PMZ3 DNA binding domain, progesterone ligand binding domain, Msn2 activating domainZ4EMZ4 DNA binding domain, estradiol ligand binding domain, Msn2 activating domainView Full Text
Scutteri, L.; Barth, P.; Rahi, S. J.
Show abstract
Many plasmids harbor unnecessary elements that complicate or hinder cloning tasks such as inserting one gene into another for protein domain grafting. In particular, restriction sites may be present in the backbone outside the polylinker region (multiple cloning site; MCS) and thus unavailable for use, and the overall length of a plasmid correlates with poorer ligation efficiency. To address these concerns, there has been a growing interest in minimal plasmids. Here, we describe the design and validation of a collection of six minimal integrating shuttle vectors for genetic manipulation in Saccharomyces cerevisiae. We constructed the plasmids using de novo gene synthesis and consisting only of a yeast selection marker (HIS3, TRP1, LEU2, URA3, natMX6, or KanMX), a bacterial selection marker (Ampicillin resistance), an origin of replication (ORI), and the MCS flanked by M13 forward and reverse sequences. We use truncated variants of these elements where available and eliminated all other sequences typically found in plasmids. The MCS consists of ten unique restriction sites. To our knowledge, at sizes ranging from approximately 2.6 kb to 3.5 kb, these are the smallest shuttle vectors described for yeast. Further, we removed common restriction sites in the open reading frames (ORFs) and terminators, freeing up approximately 30 cut sites in each plasmid. We named our pLS series in accordance with the well-known pRS vectors, which are on average 63% larger: pLS403 (HIS3), pLS404 (TRP1), pLS405 (LEU2), pLS406 (URA3), pLS408 (natMX6), and pLS410 (KanMX). These minimal vector backbones open up new opportunities for efficient molecular biology and genetic manipulation in Saccharomyces cerevisiae.
Roots, C. T.; Barrick, J. E.
Show abstract
Foundational techniques in molecular biology--such as cloning genes, tagging biomolecules for purification or identification, and overexpressing recombinant proteins--rely on introducing non-native or synthetic DNA sequences into organisms. These sequences may be recognized by the transcription and translation machinery in their new context in unintended ways. The cryptic gene expression that sometimes results has been shown to produce genetic instability and mask experimental signals. Computational tools have been developed to predict individual types of gene expression elements, but it can be difficult for researchers to contextualize their collective output. Here, we introduce CryptKeeper, a software pipeline that visualizes predictions of bacterial gene expression signals and estimates the translational burden possible from a DNA sequence. We investigate several published examples where cryptic gene expression in E. coli interfered with experiments. CryptKeeper accurately postdicts unwanted gene expression from both eukaryotic virus infectious clones and individual proteins that led to genetic instability. It also identifies off-target gene expression elements that resulted in truncations that confounded protein purification. Incorporating negative design using CryptKeeper into reverse genetics and synthetic biology workflows can help to mitigate cloning challenges and avoid unexplained failures and complications that arise from unintentional gene expression.
Starkey, F.; Menolascina, F.
Show abstract
Synthetic Biology aims to rationally engineer biological systems. Current methods often employ an initial human designed circuit topology and utilise iterative approaches, e.g. directed evolution, to fine-tune part function. This approach can be extremely time consuming and resource intensive whilst often reaching sub-optimal solutions. A design workflow in which circuits and parts are designed in silico can overcome such limitations. Here we describe a method to automatically design synthetic gene circuits with user-specified dynamics; unlike some previous contributions our algorithm is able to design circuits with analog, not just digital behaviours. We demonstrate the capabilities of our approach benchmarking it on a number of different gene circuits design tasks. We review and compare the performance of our method against state of the art and outline future opportunities for development. Finally, to foster adoption, we make our algorithm available through a web app.
Acelas, A.; Palya, H.; Flyangolts, K.; Fady, P.-E.; Nelson, C.
Show abstract
Legitimacy screening, the process of verifying the identity and purpose of customers ordering synthetic nucleic acids, is a primary safeguard against the misuse of synthetic biology. However, the associated costs discourage the adoption of screening practices. To evaluate whether AI tools can facilitate this process, we tested five large language models on five verification tasks using customer profiles of life sciences researchers from around the world. We compared AI performance against an expert human baseline on flag accuracy, source quality, source fidelity, and cost. The best-performing model, Gemini 2.5 Pro aided by four bibliographic and sanctions APIs, achieved comparable flag accuracy to the human baseline (90% and 89%, respectively). Gemini 2.5 Pro outperformed the human baseline on source quality and fidelity, at roughly one-tenth of the cost ($1.18 vs. $14.04 per customer). For information-gathering tasks, which excluded the human review step, costs averaged $0.23 per customer, around 50 times cheaper than human screening. These results support piloting AI-assisted legitimacy screening at providers of synthetic nucleic acids and other dual-use biotechnology products, with AI systems handling information gathering and human reviewers retaining authority over order fulfillment decisions.
Cronan, G. E.; Kuzminov, A.
Show abstract
Protein degron tags have proven uniquely useful for characterization of gene function. Degrons mediate quick depletion, usually within minutes, of a protein of interest - allowing researchers to characterize cellular responses to the loss of function. To develop a general purpose degron tool in E. coli, we sought to build upon a previously characterized system of SspB-dependent inducible protein degradation. For this, we created a family of expression vectors containing a destabilized allele of SspB, capable of a rapid and nearly perfect "off-to-on" induction response. Using this system, we demonstrated control over several enzymes of DNA metabolism, but also found with other substates apparent limitations of a SspB-dependent system. Several degron target proteins were degraded too slowly to affect their complete depletion during active growth, whereas others appeared completely refractory to degron-promoted degradation. We demonstrated that a model substrate, beta-galactosidase, was positively recognized as a degron substrate, but failed to be degraded by the ClpXP protease -- demonstrating an apparently unknown mechanism of protease resistance. Thus, only a minority of our, admittedly biased, selection of degron substates proved amenable to rapid SspB-catalyzed degradation. We conclude that substrate-dependence of the SspB system remains a critical factor for the success of this degron system. For substrates that prove degradable, we provide a series of titratable SspB-expression vehicles.
McGuffie, M. J.; Barrick, J. E.
Show abstract
Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science. Author SummaryPlasmids are used in molecular biology and biotechnology for a wide variety of tasks such as cloning DNA, expressing recombinant proteins, and creating vaccines. One challenge in working with plasmids is that there has been a long, and often lost history of pieces of plasmids being copied and remixed by researchers to create new plasmids. Current databases used for annotating key genetic parts in plasmids are incomplete, especially with respect to cataloguing closely related versions of parts that can have very different characteristics. Some genetic part variants have arisen due to purposeful editing while others are the result of unplanned mutations and evolution. When a researcher finds differences between a database sequence and a genetic part in their newly constructed plasmid, it is often unclear how and when it arose and whether it will affect their experiments. We identified 217 genetic part variants that are either widespread or have likely arisen independently more than once on plasmids due to convergent evolution or engineering. We propose that these variants should be prioritized for inclusion in curated databases of engineered DNA sequences and for functional characterization to improve the reliability and reproducibility of science.
Cummins, B.; Moseley, R. C.; Deckard, A.; Weston, M.; Zheng, G.; Bryce, D.; Nowak, J.; Gameiro, M.; Gedeon, T.; Mischaikow, K.; Beal, J.; Johnson, T.; Vaughn, M.; Gaffney, N. I.; Gopaulakrishnan, S.; Urrutia, J.; Goldman, R. P.; Bartley, B.; Nguyen, T. T.; Roehner, N.; Mitchell, T.; Vrana, J. D.; Clowers, K. J.; Maheshri, N.; Becker, D.; Mikhalev, E.; Biggers, V.; Higa, T.; Mosqueda, L.; Haase, S. B.
Show abstract
A challenge in the design and construction of synthetic genetic circuits is that they will operate within biological systems that have noisy and changing parameter regimes that are largely unmeasurable. The outcome is that these circuits do not operate within design specifications or have a narrow operational envelope in which they can function. This behavior is often observed as a lack of reproducibility in function from day to day or lab to lab. Moreover, this narrow range of operating conditions does not promote reproducible circuit function in deployments where environmental conditions for the chassis are changing, as environmental changes can affect the parameter space in which the circuit is operating. Here we describe a computational method for assessing the robustness of circuit function across broad parameter regions. Previously designed circuits are assessed by this computational method and then circuit performance is measured across multiple growth conditions in budding yeast. The computational predictions are correlated with experimental findings, suggesting that the approach has predictive value for assessing the robustness of a circuit design.
Holston, A. S.; Hinton, S. R.; Lindley, K. A.; Kearns, N. C.; Plesa, C.
Show abstract
Protein engineering efforts often involve the creation of hybrid or chimeric proteins, where functionality critically hinges on the precise design of linkers and fusion points. Traditional methods have been constrained by a focus on single genes or the random selection of fusion points. Here we introduce an approach which enables the creation of large gene libraries where each library comprises a multitude of diverse, specifically designed genes, each with a corresponding set of programmatically designed fusion points or linkers. When combined with multiplex functional assays, these libraries facilitate the derivation of generalized engineering principles applicable across whole protein families or domain types. Degenerate DropSynth is a multiplex gene synthesis technique which allows for the assembly of up to eight distinct variants for each of the 1,536 designed parent genes in a single reaction. We assemble chimeric sensor histidine kinases and demonstrate the assembly of genes up to 1 kbp in length with an 8% rate of perfect assemblies per gene. Our findings indicate that incorporating an increased number of variants in droplets containing barcoded beads does not significantly affect the rate of perfect assemblies. However, maintaining a consistent level of degeneracy across the library is important to ensure good coverage and reduce inequality. The results suggest the potential for scaling this process to assemble at least 8,000 distinct variants in a single reaction. Degenerate DropSynth enables the systematic exploration of protein families through large-scale, programmable assembly of chimeric proteins, moving beyond the limitations of individual protein studies.
Bryant, J. A.; Wright, R. C.
Show abstract
Golden Gate assembly is a requisite method in synthetic biology that facilitates critical conventions such as genetic part abstraction and rapid prototyping. However, compared to robotic implementation, manual Golden Gate implementation is cumbersome, error-prone, and inconsistent for complex assembly designs. AssemblyTron is an open-source python package that provides an affordable automation solution using open-source Opentrons OT-2 lab robots. Automating Golden Gate assembly with AssemblyTron can reduce failure-rate, resource consumption, and training requirements for building complex DNA constructs, as well as indexed and combinatorial libraries. Here, we dissect a panel of upgrades to AssemblyTrons Golden Gate assembly capabilities, which include Golden Gate assembly into modular cloning part vectors, error-prone PCR combinatorial mutant library assembly, and modular cloning indexed plasmid library assembly. These upgrades enable a broad pool of users with varying levels of experience to readily implement advanced Golden Gate applications using low-cost, open-source lab robotics.